Search CORE

21 research outputs found

A Maximum-Entropy Partial Parser for Unrestricted Text

Author: Brants Thorsten
Skut Wojciech
Publication venue
Publication date: 01/01/1998
Field of study

This paper describes a partial parser that assigns syntactic structures to sequences of part-of-speech tags. The program uses the maximum entropy parameter estimation method, which allows a flexible combination of different knowledge sources: the hierarchical structure, parts of speech and phrasal categories. In effect, the parser goes beyond simple bracketing and recognises even fairly complex structures. We give accuracy figures for different applications of the parser.Comment: 9 pages, LaTe

arXiv.org e-Print Archive

CiteSeerX

Chunk Tagger - Statistical Recognition of Noun Phrases

Author: Brants Thorsten
Skut Wojciech
Publication venue
Publication date: 01/01/1998
Field of study

We describe a stochastic approach to partial parsing, i.e., the recognition of syntactic structures of limited depth. The technique utilises Markov Models, but goes beyond usual bracketing approaches, since it is capable of recognising not only the boundaries, but also the internal structure and syntactic category of simple as well as complex NP's, PP's, AP's and adverbials. We compare tagging accuracy for different applications and encoding schemes.Comment: 7 pages, LaTe

arXiv.org e-Print Archive

CiteSeerX

Preference-Driven Bimachine Compilation : An Application to TTS Text Normalisation

Author: Skut Wojciech
Publication venue
Publication date: 01/11/2005
Field of study

This paper describes a grammar formalism and a deterministic parser developed for text normalisation in the rVoice1 text-to-speech (TTS) system. The rules are formulated using regular expressions and converted into a non-deterministic finite-state transducer (FST). At runtime, search is guided by parsing preferences which the user may associate with regular operators; the best solution is determined in a way similar to the directional evaluation of constraints in Optimality Theory. During compilation, the FST is converted into a bimachine, making deterministic parsing possible

Utrecht University Repository

Incremental Construction of Minimal Sequential Transducers The Unsorted-Data Algorithm for Acyclic Sequential Transducers

Author: Wojciech Skut
Publication venue
Publication date
Field of study

This paper presents an efficient algorithm for the incremental construction of a minimal acyclic sequential transducer (ST) from a list of input and output strings. The algorithm generalizes a known method of constructing minimal finite-state automata (Daciuk, Mihov, Watson and Watson 2000). Unlike the algorithm published by Mihov and Maurel (2001), it does not require the input strings to be sorted in advance. The algorithm is illustrated by an application in a text-to-speech system.

CiteSeerX